The impact of a mobile app-based corporate sleep health improvement program on productivity: Validation through a randomized controlled trial

Based on a randomized controlled trial applied to employees of a manufacturing company, this study examines the extent to which a corporate sleep program improves workers’ sleep health and productivity. In the three-month sleep improvement program, applicants were randomly divided into a treatment group and a control group, and the treatment group was provided with a noncontact sensing device to visualize their sleep. A smartphone app linked to the device notified them of their sleep data every morning and presented them with advice on behavioral changes to improve their sleep on a weekly basis. The results of the analysis revealed the following. First, even after controlling for factors that may cause sleep disturbances and nocturnal awakenings, such as increased workload and the number of days spent working from home during the measurement period, the treatment group showed improved sleep after the program compared to the control group. Second, the treatment group showed statistically significant improvement in presenteeism (productivity). The effect size on presenteeism through sleep improvement was similar regardless of the estimation method used (i.e., ANCOVA estimator of ATT and two 2SLS methods were performed). In particular, we confirmed that productivity was restored through sleep improvement for the participants who diligently engaged in the program. These results suggest that promoting sleep health using information technology can improve sleep deficiency and restore productivity.


Introduction
In this modern society, it is said that a significant proportion of people have problems with sleep, and sleep deprivation or sleep deficiency has become a global concern.For example, according to the Sleep Foundation [1], 44% of the respondents in the United States reported that they experience sleep problems almost every day.The Comprehensive Survey of Living Conditions (Ministry of Health, Labour and Welfare, 2019) also reports that approximately 30% of Japanese adults report that they do not get much or no rest from sleep (see also Kitamura et al. [2] and Ikeda et al. [3] for Japan, Stranges et al. [4] for Asia and Africa and van de Straat [5] for European countries).Many of these people may regard sleep deficiency as an inevitable part of life; however, if the lack of adequate sleep results in a loss of productivity, it is a major economic loss.The purpose of our paper is to investigate the extent to which the decline in productivity is caused by inadequate sleep and whether productivity can be improved if the quantity and quality of sleep are improved.
However, since most previous studies described above were based on cross-sectional data, identifying the causality between sleep and productivity is still a challenge.Just as there is an endogeneity problem in identifying the impact of sleep on health, the relationship between sleep and productivity is not clear (see, for example, Anderson and Bradley [18], Finan et al. [19], Johannessen and Sterud [20] and Cho and Chen [21]).In other words, rather than causality running from lack of sleep to productivity declines, there may be a reverse causality in which workers with low productivity are unable to complete their tasks on time and work longer hours, resulting in less sleep (see dotted arrow in Fig 1).As a result, a growing number of sleep studies have used randomized controlled trials (RCTs) to identify causal relationships (for example, Kaku et al. [22], Nishinoue et al. [23], Nakada et al. [24], Omeogu et al. [25], Okajima et al. [26] and Kjørstad et al. [27]; see also Robbins et al. [28] for a systematic review).As shown in Fig 1, if the intervention is randomly assigned, which means that the treatment subject receives is not correlated with the subject's unobservable characteristics, we can examine the causal effect of sleep on productivity through the intervention (the bold arrow).Furthermore, to our knowledge, many previous intervention studies have mainly focused on patients who suffer from sleep disorders such as insomnia, and there have been only a few sleep intervention studies for general workers (see Burton et al. [29] and Redeker et al. [30]).While it is urgent to improve the sleep of patients suffering from insomnia, it is also true that there are many general workers with sleep deficiency (but not necessarily those with insomnia), even in the unwell stage, as mentioned above.We believe that more studies are needed on sleep improvement and the productivity of general workers using the RCT framework (Boubekri et al. [31]).
In this paper, by introducing the concept of sleep health proposed by Buysee [32], we investigate whether the improvement of sleep health improves work productivity through a randomized controlled trial.Sleep health is a concept that enables not only patients with sleep problems but also all people to achieve well-being and good performance through sleep.Specifically, by employing several dimensions of sleep health measures proposed by Buysse [32] that are related to health and performance outcomes, we conducted a 3-month RCT of sleep improvement for 215 employees of a large Japanese manufacturing company to determine whether employees who participated in a sleep improvement program gain sleep health compared to employees who do not participate and to what extent productivity improves when sleep health improves.In the sleep improvement program, the treatment group is provided with a noncontact sensing device that measures daily sleep data over a three-month period, notifies the person of the data every morning to visualize his or her sleep, and provides weekly advice on sleep improvement to encourage behavioral change through a smartphone app.
The contributions of this paper are as follows.First, we conducted an RCT in the general workforce to investigate the extent to which sleep health improves and work productivity improves as sleep health improves.As previous studies in sleep health have positioned presenteeism as a measure to capture productivity loss, we also consider presenteeism as productivity and use two measures in a self-response format: a composite indicator using 15 productivityrelated questions in our original survey and the Work Limitations Questionnaire (WLQ) developed by Tufts University.Second, although there is a growing body of research demonstrating the impact of sleep health on physical and mental health (for example, Dong et al. [33], Lee and Lawson [34]), to the best of our knowledge, few studies have conducted RCTs on the relationship between improved sleep health and productivity.For sleep health, we selected six important aspects of sleep that may affect physical and mental wellness and performance: sleep duration, satisfaction/quality, alertness/sleepiness, efficiency of sleep, and timing and regularity as measures of sleep health based on Dong et al. [33] and Lee and Lawson [34].Third, since we limit the subjects of our analysis to employees who work for the same company, there is no need to consider differences between employees working for different companies, such as management philosophy, corporate culture, corporate performance, and many other factors that may affect sleep.In addition to basic personal attributes to control for the heterogeneity among workers within the firm, we also control for a variety of factors that may affect sleep, including work-related factors (such as work tasks, the level of support from superiors and colleagues at the workplace, the number of days spent working from home, transfers and promotions) and major personal events (marriage, childbirth, and bereavement) that occurred during the three months of the program period.Fourth, we will focus on the treatment group and examine whether the degree of sleep improvement differs depending on the attributes and environment when RCTs are conducted.

2-1. Randomized controlled trials
The randomized controlled trial of the sleep-improvement program analyzed in this paper was conducted among employees of a publicly traded Japanese manufacturing firm with more than 10,000 employees.The program was implemented over a six-month period from November 1, 2020, to the end of April 2021.Using a crossover RCT design, all subjects were randomly assigned to participate in one of the two periods (period 1: November to January, period 2: February to April).Of 215 participants who applied for the program, 157 were assigned to period 1 (treatment group) and 58 to period 2 (control group).The allocation to the two groups was unequal because of the firm's decision to assign at least two-thirds of participants to the treatment group.Outcomes were assessed for the first period, with those assigned to period 2 serving as the control group.
Note that our datasets are proprietary and obtained in a legally restricted manner under confidentiality agreements with the firm and therefore cannot be made publicly available.The research team was provided with the anonymized data after the firm obtained written consent from all participants for the data to be used for academic research purposes.Specifically, this project was an opt-in, rather than mandatory, recruitment process.The firm clearly stated that the data collected would be analyzed within the firm to improve employees' sleep and that once the internal analysis was completed, the anonymized data would be made available to Waseda University for academic research.Employees were also given the option to refuse to provide their data, but all participants in the project gave their written consent to provide their data.The authors notified the Ethics Review Committee of Waseda University that the program would be implemented by the firm and that the data would be provided for secondary use after the firm conducted an internal evaluation.The committee issued a decision indicating that no ethical review was required.
Participants were provided noncontact sensing devices from NeuroSpace Co., Ltd. during the term assigned.NeuroSpace is a Japanese venture company that develops sleep-sensing technology and simple evaluation algorithms and supports corporate health management through sleep improvement programs.The device, developed by EarlySense, an Israeli company, is placed under the bed mat and measures sleep status (bedtime, sleep latency, REM sleep, non-REM sleep, light sleep, number of awakenings, time of waking up, and actual time slept) based on heart rate and tossing and turning information.Note that Tal et al. [35] performed an epoch-by-epoch comparison of the EarlySense device and polysomnography in the sleep laboratory setting and reported that the system showed sleep detection sensitivity, specificity, and accuracy of 92.5%, 80.4%, and 90.5%, respectively.The daily sleep information obtained by the sleep measurement device is sent to a special app installed on the participant's own smartphone, allowing the participant to check the daily sleep information (for details, see S1, S2 Appendices in S1 File).The participants received three elements of intervention during the three-month program: (1) they were able to check visualized information about their own sleep status every morning, (2) each week, they received a list of recommendations on how to improve their sleep and choose several items to achieve from the list, and (3) they were checked their compliance every night (advice was not just provided, but behavioral changes were encouraged by the app's daily confirmation of the achievement status).
In addition, two seminars, which aimed to explain the app's advice in detail and to provide knowledge about sleep hygiene, were held during the three-month program in period 1.However, participation in the seminar was voluntary, and to ensure fairness, the firm allowed not only the treatment group but also the control group to participate in the seminar.Since the data also identify the IDs of individuals who actually participated in the first and second seminars in both groups, the following analysis controls for seminar participation.

2-2. Data
For both the treatment and control groups, we used responses to self-administered surveys conducted before the start of the program (the end of October 2020; the baseline survey) and after implementation in period 1 (the end of January 2021; the follow-up survey).
To measure the effect of the intervention, only the sample that responded to both the baseline and follow-up surveys was used in the following analysis.Because there were several participants who left the program or did not return the questionnaire, the final sample of participants who completed the follow-up survey included 145 observations in the treatment group and 57 observations in the control group (see Fig 2).The attrition rate was not large, at 7.6 and 1.7% for the treatment and control groups, respectively.There was no statistically significant difference in composition between the two groups at the beginning of the RCT.However, since several subjects dropped out after the RCT, we further checked possible differences between the two groups using the final sample of 202 participants.Table 1 compares the composition between the two groups.The rightmost column of Table 1 shows the results of the tests for differences between the two groups (t tests or chi-squared test) with p values, which indicate that there was no statistically significant difference between the two groups.
2-2-1.Sleep-related behaviors.The baseline and follow-up surveys asked essentially the same nine questions about sleep-related behaviors, which were used in the analysis to examine whether the program resulted in changes in participants' behaviors that led to improved sleep.The specific questions are indicated in S1 Table in S1 File.Variables were created so that the greater the number was, the higher the frequency of each behavior.
2-2-2.Sleep health measures.To test the extent to which RCTs improve sleep health, we used the following six sleep health scale items in accordance with Buysee [32], Dong et al. [33], and Lee and Lawson [34]: sleep satisfaction, daytime alertness/sleepiness, sleep efficiency, sleep duration, timing, and regularity.The specific questions used to measure these scale items are described in Table 2. Using them, we created two sleep health scores, SH1 and SH2.SH1 is a simple summation of dummy variables created from the six items with a 4-point scale dichotomized into 1 and 0 (see Table 2 for details).SH2 is a composite index created using correspondence analysis applied to eleven 4-level category variables (except "timing") related to sleep health (see also Table 2 for details).The index is created in such a way that the larger the value is, the better the sleep and is normalized with one corresponding to one standard deviation.SH1 is similar to the scores used in Dong et al. [33] and Lee and Lawson [34] and therefore comparable to those of previous studies.SH1, however, may have problems accessing an appropriate duration of sleep since it treats short and long sleep durations equally.As pointed out by the results of several previous studies, the effects of long sleep duration are mixed (see, for example, Watson et al. [36].Therefore, it is better to measure the sleep duration in a nonsymmetric form, and we believe that the correspondence analysis used to create SH2 can capture the nonsymmetric effect. 2-2-3.Productivity.We use two indices of productivity, Productivity indices 1 and 2, to examine to what extent work productivity improved as a result of improvement in sleep health through the sleep improvement program.To create the former, we use the responses to the 15 questions provided in the baseline and follow-up surveys (the questions are described in S1 Table in S1 File).
Respondents were asked to answer each of these questions on a 10-point Likert scale, with higher numbers indicating a greater frequency.Responses to these 15 items are transformed into a composite variable, Productivity Index 1, by correspondence analysis.As the second productivity index, we use presenteeism data based on the Work Limitations Questionnaire (WLQ; see Lerner et al. [37,38]) The data were provided by the firm, which conducts the WLQ survey of all employees every year in September and October.The WLQ has been translated into multiple languages, and the WLQ-J, which is a Japanese translation of the WLQ, was used in this study (for an explanation of the WLQ-J, see Ida et al. [39]).Since the survey was conducted just prior to the implementation of the sleep improvement program, the firm administered the WLQ survey to the program participants (both treatment and control groups) after period 1 (at the end of January 2021) to verify the effectiveness of the program.
As in Productivity index 1, the higher the value in Productivity index 2 is, the higher the productivity.The WLQ has four subscales (time management, physical tasks, mental-interpersonal tasks, and outcome tasks) and a total scale that combines them.
The number of participants who responded to both baseline and follow-up WLQ surveys was 114 in the treatment group and 39 in the control group.Using these samples, we also conducted tests of the difference in composition as well as Productivity Indices 1 and 2 between the two groups before the RCT and confirmed that there was no statistically significant difference.Used as a dummy variable that takes 1 if answer is "very satisfied"/"satisfied" and 0 if "not very satisfied"/"not satisfied at all" Used as a 4-level category variable: "very satisfied" 4, "satisfied" 3, "not very satisfied" 2, and "not satisfied at all" 1 daytime alertness "Do you ever feel sleepy, such as yawning, drowsy, or tired within 4 hours of waking?" Used as a dummy variable that takes 1 if answer is "almost never"/"once or twice a week" and 0 if "three or more times a week"/"almost every day" Used as a 4-level category variable: "almost every day" 4, "three or more times a week" 3, "once or twice a week" 2, and "almost never" 1 "Do you feel drowsy at work?" sleep efficiency "Do you ever wake up more than an hour earlier than you intended to and then have trouble going back to sleep?" Used as a dummy variable that takes 1 if answer is "almost never"/"once or twice a week" and 0 if "three or more times a week"/"almost every day" "Do you have trouble sleeping through the night and wake up in the middle of the night for some reason?"Used as a dummy variable that takes 1 if answer is "rarely"/"less than half" and 0 if "more than half"/ "every time" Used as a 4-level category variable: "every time" 4, "more than half" 3, "less than half" 2, and "rarely" 1 total Simply sum six dummy variables and convert to a score of 0-6 Convert all items into a composite index by correspondence analysis https://doi.org/10.1371/journal.pone.0287051.t002

2-2-4. Control variables.
Considering possible compounding factors that may affect both sleep and work productivity, we try to control for a variety of work and workplace conditions in the analysis below.
Table 3 lists the basic statistics for the primary data used in the analysis (see S1 Table and S1, S2 Appendices in S1 File for further details of other data used in the analysis).

3-1. Analysis of covariance (ANCOVA)
According to McKenzie [40], the following ANCOVA estimator of Eq 1 has more power to test for RCTs than a difference-in-differences analysis.Based on McKenzie [40], we employ the ANCOVA estimator to measure the effect of the RCT.
Y i;t are the outcome measures for worker i from the baseline (t = 0) and follow-up (t = 1) surveys, and n T and n C are the number of workers in the treatment group and in the control group, respectively.
Eq 1 can be estimated by running an OLS regression of the following equation: where ĝancova and ŷ of Eq 1 are the estimates of γ and θ of Eq (2), Treatment i is a binary variable that takes the value of one if individual i is assigned to the treatment group and zero if individual i is assigned to the control group, and � i denotes the error term.The outcome measures include the behavioral changes related to sleep, the sleep health indicators SH1 and SH2, six subcategories of sleep health indicators, and two presenteeism measures (Productivity index 1 and 2).To further improve the power, we control for individual attributes and major events in both work and personal life during the program period described in 2.2.4 by including covariates X i in Eq 3 below.

3-2. Two-stage least squares (2SLS)
The treatment effect (γ) of the sleep improvement program calculated in Eq 3 is the average effect for all participants (ATT).We might interpret it as the intent-to-treat (ITT) effect because there may be a certain number of participants who did not use the app or follow the advice at all and thus did not change their behavior.In fact, in the follow-up survey, there were participants who stated that the program was too troublesome, and they did not take it very seriously (27.8%).Some said that they got bored in the middle of the program and did not try as hard to comply in the latter half (8.3%) or that they tried to comply, but it depended on their mood (8.3%).On the other hand, 9.0% of the participants answered that they actively tried their best, and 46.5% were able to try to some extent, indicating that approximately half of the participants actively participated in the sleep improvement program to some degree.Since the estimation of Eq 3 represents the average effect of the assigned intervention, there could be a large variation in the effects of intervention on sleep and productivity if there is a certain proportion of "noncompliers."Therefore, in what follows, we use 2SLS to estimate the Local Average Treatment Effect (LATE) of diligent participation.Specifically, we used the dummy variable Z i , which represented individual i's degree of diligence toward the program (the responses to "I was able to actively try my best in the program" and "I was able to try in the program to some extent" were set to 1 and 0 otherwise) as the endogenous right-hand side variable.We then estimated Eq 4 as the first-stage estimation using Treatment i as the instrument along with covariates X i .Eq 5 corresponds to the second-stage estimation, where Y it is the outcome measure.
In addition, note that only 58 people (approximately 40%) in the treatment group had higher posttreatment sleep satisfaction than preintervention, while 8 people (14%) in the control group reported improved sleep satisfaction.Since less than half of the treatment group also experienced improved sleep, it is necessary to determine the extent to which the intervention improved sleep and the extent to which productivity improved through that pathway.Even if the interventions did improve productivity, we do not know if all the improvement was realized through improved sleep health.For example, people may have become more conscious of their health and reviewed their dietary and exercise habits during the program, which in turn might have increased productivity through improved physical condition.It is also possible that the subjects' assignment to the treatment group increased their loyalty to the company and raised their productivity as a result of working more diligently.
If the ATT analysis includes effects through pathways other than sleep improvement, it is not possible to elucidate the extent to which sleep improvement contributed to the improvement in productivity.Therefore, we estimated the causal effect of sleep improvement by 2SLS as follows.Using the sleep health scores SH2 as the endogenous right-hand side variable, we estimated Eq 5 as a first-stage estimation using Treatment i as the instrument along with covariates X i .Note that S i1 and S i0 are the posttreatment and pretreatment sleep health scores SH2.Eq 6 corresponds to the second-stage estimation, where Y it is the outcome measure.
In what follows, we estimated the impact of the intervention on sleep health scores SH1 and SH2 and then estimated the impact on two productivity indices by ANCOVA and 2SLS through diligent participation in the program and sleep improvement.

4-1. Changes before and after the program
We first present simple graphs to show the extent to which sleep health improved by comparing the pretreatment and posttreatment sleep health indices of the treatment and control groups.Fig 3 shows the mean values of (1) SH1 and (2) SH2 for the treatment and control groups separately for the pretreatment and posttreatment periods.The results of the t tests for the difference of means between the two groups are shown by p values in the figure .The figure shows no significant difference between the two groups for either SH1 or SH2 before treatment.However, we found statistically significant differences at the 1% level for both SH1 and SH2 after treatment, suggesting that the sleep improvement program improved sleep in the treatment group.
In the following, we rigorously examine these results by estimating the three models (ANCOVA and two 2SLS models through diligent participation and sleep improvement) explained in the previous section.

4-2. Treatment effects on sleep-related behavior
We now examine the effect of the program on sleep-related behavior.Using the ANCOVA model in Eq 3, we estimate the treatment effect on the behavioral changes related to sleep, listed as (a)-(i) in section 2-2-1.All variables were scored on a four-point scale, where a lower number is interpreted as less frequent.
Table 4 shows the results.Among the nine sleep-related behaviors, we found a statistically significant improvement for the treatment group in the first two items.According to column (a), participants in the treatment group were 0.42 SD less likely to "do something unrelated to sleep in bed" than those in the control group.Column (b) shows that participants in the treatment group were 0.39 SD less likely to "spend time in a well-lit room until about an hour before going to bed" than those in the control group.Since the dependent variables are categorical, we also estimated ordered logit models and confirmed that the results are qualitatively the same.We did not find any significant effects for other sleep-related behaviors (columns c-i).

4-3. Treatment effects on sleep improvement 4-3-1. Effects on sleep health indices (SH1 and SH2).
Using SH1 and SH2 as dependent variables, we next estimated the effect of the program on sleep health improvement using Eq 3. The first column of Table 5 shows that the SH1 index rose by 0.455 points (0.38 SD) in the treatment group compared to the control group.Additionally, in the second column, the results show that the program improved the sleep health index SH2 in the treatment group by 0.40 points (0.40 SD) compared to the control group.Although the methods of generating SH1 and SH2 are different, it is noteworthy that the extent of improvement in sleep health is approximately the same in terms of standard deviation.In what follows, we will focus on SH2.

4-3-2. Effects on SH2 subscales.
To identify which subitems of SH2 improved through the intervention, we further estimate Eq 3 by using the six subscales of SH2 as dependent variables.The results are shown in the third and subsequent columns of Table 5.
A statistically significant effect was confirmed for sleep satisfaction, which rose 0.289 points after the intervention.Regarding daytime alertness, the intervention effect was found to reduce the frequency of "feeling sleepy, such as yawning, drowsy, or tired within 4 hours of waking" by 0.27 (column 4).Regarding sleep efficiency, the intervention was effective in reducing the frequency of "feeling foggy and can't get up right away even when the alarm goes off" by 0.37 (column 8).No statistically significant effects were found for sleep duration, timing, or regularity.These results suggest that the intervention had a positive effect on sleep satisfaction and some other aspects of daytime alertness and sleep efficiency, leading to an improvement in SH2.There are some other notable findings in the table.First, from the results in column 7, an increase in the number of days spent working from home may have caused the participants to experience more "trouble sleeping through the night and waking up in the middle of the night."Telework may have an adverse effect on sleep, such as nocturnal awakening, potentially due to changes in work schedule, such as working until just before going to bed or waking up later.However, no adverse effects other than nocturnal awakening were detected for telework; thus, any adverse effect it has on sleep may be limited.Second, those who had an increase in workload may have experienced a decrease in "falling asleep fast enough to lose consciousness within five minutes" (column 10).Third, for those who were promoted during the RCT implementation period, "feeling sleepy" (column 4) was statistically negative and significant, while "waking up more than an hour early and having trouble going back to sleep" (column 6) was statistically positive and significant.This result can be interpreted as suggesting that immediately after a promotion, workers are more likely to feel nervous and excited by the new tasks they are given and are more likely to have trouble sleeping.No effects were detected for items other than these two, however.

4-4. Treatment effects on productivity
4-4-1.ATT on productivity.We next estimate Eq 3 using two productivity measures, Productivity index 1 (questionnaire-based productivity composite index) and Productivity index 2 (WLQ).We control for the pretreatment productivity index and other control variables explained in section 2.
Column 2 in Table 6 shows that Treatment has a statistically significant coefficient when using the total WLQ score (Productivity index 2) as the dependent variable.In other words, the sleep program resulted in a 1.45-point higher total WLQ score for the treatment group than for the control group.Productivity index 1 and the subscales of Productivity index 2 show some improvement but only weakly significant differences at best.However, as mentioned in the previous section, the causal effects of the intervention presented in Table 6 might have been weakened because not all participants actively used the app and its advice to improve their sleep-related behaviors.Therefore, in the following sections, we attempt to evaluate the effect of diligent participation on sleep improvement using two-stage least squares.

4-4-2. Effect of diligent efforts on productivity (2SLS).
In what follows, we use the 2SLS method to determine whether diligent participants in the sleep program improved their productivity.This estimate can be interpreted as the local average treatment effect (LATE) of diligent participation as expressed by the model, Eqs 4 and 5, in section 3-2.
The results are shown in Table 7.The endogenous variable, diligent participation, has a statistically significant coefficient when the dependent variable is the total score (Productivity

4-4-3. Effect of sleep improvement on productivity (2SLS).
We next estimated the effect of the intervention on productivity through improvements in the sleep health indices SH1 and SH2.A 2SLS estimation with the posttreatment SH2 as the endogenous variable and the treatment group dummy as the instrument variable was conducted using Eqs 6 and 7 in section 3-2.Table 8 confirms significant causal effects of improved sleep health (SH2) on Productivity index 1 and Productivity index 2.The latter includes the WLQ's total score and its subscales: time management, mental interpersonal tasks, and output tasks.Among participants in the treatment group, a one standard deviation improvement in the SH2 led to a 0.59-point increase in Productivity index 1 (column 1), a 3.00 point increase in the WLQ total score (column 2), a 10.94 point increase in the time management score (column 3), an 11.37 point

Table 7. Effect of diligent participation on productivity (LATE).
(1) (  increase in the mental interpersonal tasks score (column 5), and a 12.11 point increase in the output tasks score (column 6).We reran the same model estimation using each of the SH2 subscales as an endogenous variable to examine which dimension of the sleep health measure mediated the treatment effect on productivity.Statistically significant effects were found when using satisfaction and daytime alertness as the endogenous variables.To summarize, productivity increased through the improvement of SH2, which was primarily mediated by satisfaction and daytime alertness among six sleep health dimensions.Note that the purpose of this paper is to examine the relationship between sleep improvement and productivity in the general workforce, rather than in patients with insomnia.Therefore, we did not set any special criteria for recruiting subjects within the company but recruited a wide range of workers who were interested in improving their sleep health.However, we must be aware of the possibility that some of the applicants may have had insomnia or depression, which could substantially affect the effectiveness of the sleep hygiene program.To prevent such cases from affecting our results significantly, we also ran the same regression analysis with a subsample, excluding observations based on several of the criteria (those who had mentioned "sleepless," "need improvement in physical condition," or "Feeling depressed almost every day") as a robustness check.We obtained similar results to those reported in this paper even after excluding those observations.

4-5. Economic return to sleep improvement programs
In section 4-4, we measured the effects on productivity through three pathways: average effects on the treatment group, local average treatment effects of diligent participants, and indirect effects through sleep improvement.Based on those estimators, we evaluated the economic return to sleep improvement programs using the total WLQ score.Although we should look at the net return on investment, the total cost of the sleep improvement program was not disclosed to us by the firm, so we only evaluated the economic benefit.
The following assumptions are made in our calculation.
1.The range of change in the overall WLQ score equals the rate of change in productivity.
2. The average productivity per person per year is 8 million yen (approximately 60,000 US dollars).
3. For the 112 participants in the treatment group who responded to the WLQ, the estimated effect of the intervention continued for one year.
The first assumption is based on our understanding that the WLQ was originally designed to measure the value of productivity loss (Lerner et al. [37]), and the manager in charge of the program at the firm said that he adopted the WLQ survey because he felt that the loss rate of the WLQ was closest to his actual perception of productivity loss.Note that the calculation of the cost of presenteeism using WLQ has been shown in many previous studies.See, for example, Mitchell and Bates [41], who calculated the loss due to presenteeism based on the WLQ after matching people with and without the disease using propensity scores.For assumption 2, we set the average productivity at 8 million yen because the average labor cost of listed manufacturing companies is said to be approximately 8 million yen.
According to Table 6, the effect on the treatment group (ANCOVA) has a coefficient of 1.454, which can be interpreted as an improvement of 1.454 percentage points in the loss of value of WLQ.Our treatment sample contains the total index of WLQ for 112 people, for whom the economic benefit is 0.01454 * 8 million yen * 112 people = 13.03 million yen (97,700 US dollars).
Next, the local average treatment effect (LATE) of diligent participation in Table 7 is 2.622.Although it is not shown in the table, the ratio of participants in the treatment group who made a serious effort to comply was 55.5%.Therefore, the improvement in value through diligent participation is 0.555 * 0.02622 * 8 million yen * 112 participants = 13.04 million yen (97,800 US dollars).
Finally, the coefficient of sleep improvement (i.e., endogenous variable in 2SLS) in Table 8 is 3.00, but the impact of the intervention on the sleep health index SH2 is known to be 0.503 from the first-stage estimates (although not shown in the table).Hence, the economic benefit of improved sleep is 0.503 * 0.03004 * 8 million yen * 112 people = 13.53 million yen (101,500 US dollars).In summary, the economic benefits of all three estimation methods are almost the same, suggesting that the increase in productivity is due to the improved sleep of those who took the intervention seriously (55.5% of all participants), and the other possible pathways were largely negligible.
The most uncertain of our assumptions is the third assumption, the duration of the intervention effect.The Hawthorne effect may have exaggerated the short-term effect of the program, and the improved sleep-related behaviors and productivity gains may be reversed in a few months.Conversely, there is also a possibility that participants who realized the importance of sleep may continue to take steps to improve their sleep, and the effects may last for more than a year.

4-6. Who improved and who did not?
As mentioned in section 4-3, not all participants in the treatment group improved their sleep health through the sleep improvement program.According to previous studies, the effects and dropouts of RCTs vary according to individual characteristics, personality traits and level of education (see, for example, Bagby et al. [42], Edmonds et al. [43], Schmidt et al. [44]).Given these findings, in this last section, we investigate whether sleep health improvement is more pronounced for individuals with specific attributes.In sections 4-3 and 4-4, most analyses were limited to the participants who responded to the WLQ.In this section, we focus on the 145 participants in the treatment group for whom follow-up survey data are available.
We use two dependent variables that specify those whose sleep has improved.The first is a binary variable in which one is assigned to those whose sleep satisfaction increased by at least one point on a four-point scale and zero otherwise.The second is a binary variable that takes one if the change in the sleep health score SH2 after the treatment corresponds to the top 40th percentile and zero otherwise.These variables are regressed on explanatory variables in the logit model.For the covariates, we employ responses to the following questions in the baseline survey: (question 1) "Please answer frankly about your current state of mind when starting the program" and (question 2) "If you have made health-related habitual efforts (exercise, diet, etc.) in the past, please select an answer that most closely matches your own tendency.The options for question 1 are "I don't like to start new things", "I am interested in starting new things, but I don't want to change my life too much", "I want to challenge new things as much as I can", and "I want to challenge new things aggressively even if it is a little difficult".Participants are asked to choose one of the four options for questions 1 and 2. We create a dummy variable that is assigned one for those who chose the last option, "Challenge aggressively", which corresponds to the highest willingness for question 1, and zero for the others, as a proxy variable for the person's willingness to make a serious effort.For question (2), the options are "I get bored in the middle and don't continue to the end," "I can continue if I have the right support," "I can continue to work hard alone to achieve my goal," and "I feel a sense of accomplishment in completing the task and can go on and on by myself."We generate a dummy that takes one for those who chose "a sense of accomplishment in completing the task (highest perseverance)" and zero for the others as a proxy for the person's perseverance.In addition to basic attributes such as gender, age, and family structure, major changes in work and living conditions during the program period and sleep conditions before the program started are added as explanatory variables.
The estimation results are presented in Table 9.The variables of willingness and perseverance are shown to have a significantly positive effect on sleep satisfaction (columns 1 to 3), while both are less significantly associated with SH2 (columns 4 to 6).The analysis also revealed that the probability of sleep improvement was higher for those with lower prior sleep satisfaction (columns 1 to 3) or sleep health conditions (columns 4 to 6).In addition, we find that it is difficult to improve sleep satisfaction for those in their 40s and older.

Discussion and conclusion
Using a randomized controlled trial of a sleep improvement program in a firm's workforce, this paper examined the extent to which improved sleep health increases work productivity.The results of the analysis revealed the following.First, participants in the sleep improvement program showed a statistically significant improvement in sleep health compared to the control group after accounting for a variety of work and lifestyle factors that may affect sleep.Second, the treatment group also exhibited a statistically significant improvement in work productivity, which was fully explained by the effect through sleep improvement.The total effect sizes for productivity improvement were similar whether the ANCOVA or 2SLS estimation methods that capture the effect through diligent participation or sleep improvement were used.Although it is possible for anyone to experience worsening sleep due to various changes and events in their work and personal lives, promoting sleep health through the use of information technology such as sensing devices may help improve sleep deficiency and restore productivity.However, the results of this paper also reveal that the effect is heterogeneous depending on individual characteristics such as age and willingness and perseverance to improve sleep health.When implementing proactive corporate health interventions, it is important to plan ahead, identifying groups of employees who might benefit most from interventions and incorporating additional nudges to support those who have difficulty changing their behavior.
Currently, many developed countries are experiencing an aging population.Japan has the oldest population in the world, with the proportion of people aged 65 and over recorded at 28.7% of the total population in 2020, and this proportion is projected to rise to 35.3% by 2040.To cope with the aging of the population, the government is requiring companies to try to retain their employees until the age of 70 under the revision Act on Stabilization of Employment of Elderly Persons from April 2021.There is concern that as the number of people with sleep problems increases with age (see, for example, Le ´ger et al. [45], Dregan and Armstrong, [46,47], van de Straat and Bracke [5]), their productivity will also be affected.According to the National Survey on Basic Living Conditions (Ministry of Health, Labour and Welfare 2019), the proportion of people who do not get enough rest from sleep, as mentioned at the beginning of this paper, is reported to increase with age.In addition, many Japanese people sleep less (Walch et al. [48]), and estimates based on the Survey on Time Use and Leisure Activities show that sleep duration among full-time workers has been on a downward trend for the past 30 years (Kuroda [49]).A similar downward trend of sleep duration in the United States was also reported by Sheehan [50] (for an international comparison of sleep duration, see Whinnery et al. [51], Kuula et al. [52]).In addition to the aging of the population, we cannot ignore the impact of the global shift to a "24-hour economy" and the accompanying extension of working hours due to the spread of smartphones and telecommuting.We now face blurred boundaries between life and work, leading to a deterioration in sleep regularity and an increase in the number of people with sleep deficiency.
As this paper has shown, poor sleep health leads to a decline in productivity, which is a major loss for the economy.The problem, however, is that when a person experiences chronically poor sleep, he or she may feel it is normal, and it becomes difficult for the person to recognize the decline in productivity.Additionally, even if the person is aware of productivity loss, companies may not be able to detect presenteeism.A person can see a doctor if he or she has extreme insomnia, but few people will see a doctor for a slight loss of productivity if it does not interfere with their daily lives.As a result, people tend to neglect their daily sleep, and even if they want to get a good night's sleep, they are unlikely to actively gather information to improve their sleep health or examine related bad habits that negatively affect sleep.Companies should understand that the importance of sleep health is not easily recognized by individuals and may consider implementing proactive health management, such as providing sleep improvement measures or subsidizing the cost of sleep technology for employees.Finally, we discuss the limitations of our paper.First, we conducted an RCT on white-collar day workers as the main target group, and productivity improvements through sleep improvement were confirmed for those workers.However, it is not clear whether similar improvements can be expected for shift workers and blue-collar workers in various industries, such as construction, manufacturing, and transportation.It is necessary to verify the results by expanding the target occupations.Second, this paper targets workers who are interested in improving their sleep, and the sample size is not large (approximately 200 workers).It is necessary to examine the results of a larger sample including those who have sleep problems but are not interested in the sleep health measures or those who are strongly resistant to the interventions.Third, the effects of the RCT analyzed in this paper were based on data immediately after the program was implemented, and it is unclear how long the changes in the participants' behavior to improve their sleep might last.Even if the advice given by the sleep app was temporarily beneficial, the improved behaviors may be reversed as soon as individuals stop using the app or sleep measurement device.Fourth, it is also necessary to examine whether productivity gains due to improved sleep are a temporary phenomenon or whether they persist.It will be important to monitor changes over time by conducting multiple follow-up surveys of RCT subjects.Fifth, the usage of actigraphy data recorded by the sleep device in the treatment group is another future challenge.We created the third sleep health score (SH3) using the daily sleep data recorded for the treatment group to examine the impact on presenteeism.We conducted the analysis limiting samples only to the treatment group by accounting for possible selection biases (note that the actigraphy data were only available for the treatment group).There was no statistically significant effect on either productivity index 1 or 2 through the improvement of SH3.This is not necessarily due to limited variation in sleep health within the treatment group because when a similar analysis was conducted for SH2 using only treatment group samples, productivity index 1 was significantly associated with the improvement in SH2.There is a possibility that actigraphy sleep data do not sufficiently capture the factors that affect sleep satisfaction.According to Tal et al. [35], the device had a sensitivity of 92.5% in determining sleep/awake state.When breaking sleep into rapid eye movement (REM), light sleep and slow wave sleep, those sensitivities dropped to 53.7, 64.9, and 56.2, respectively.In the future, it may be necessary to explore more effective ways to utilize actigraphy sleep data, such as the use of machine learning.These points remain to be addressed.

Fig 3 .
Fig 3. Changes before and after the intervention.(Notes) The graphs show the mean values of Sleep Health Scores 1 and 2 (SH1 and SH2) of 145 participants in the treatment group and 57 participants in the control group at each time point before and after the program.The larger the value on the vertical axis, the better the sleep health is ensured.The p values shown in the figure indicate the results of significance tests for the two groups before and after the program, indicating that although there was no statistically significant difference between the two groups before the intervention, the sleep improvement in the treatment group was statistically significant at the 1% level after the intervention.https://doi.org/10.1371/journal.pone.0287051.g003

Table 4 . Effect of intervention on behavioral change in sleep.
The variables described in section 2.2.4 are used as control variables in the estimation but are omitted from the table. https://doi.org/10.1371/journal.pone.0287051.t004

Table 5 . Effect of intervention on sleep improvement.
The variables described in section 2.2.4 are used as control variables in the estimation, but only some of them are included in the table.

Table 6 . Effect of intervention on productivity improvement (ANCOVA).
the time management score, or the work performance score of the WLQ (columns 2, 3, and 6).In other words, when the assignment to the treatment group caused participants to engage in the program diligently, their total score, time management score, and output tasks score on the WLQ improved by 2.62 points, 9.31 points, and 11.36 points on average, respec- (Notes) 1. Robust standard errors in parentheses.2.***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels.3.The variables described in section 2.2.4 are used as control variables in the estimation but are omitted from the table.https://doi.org/10.1371/journal.pone.0287051.t006index2), tively.However, no significant effects on physical task scores, mental-interpersonal task scores, or Productivity index 1 (productivity composite measures) were found at the 5% level.

Table 9 . Relationship between sleep health improvement and individual attributes.
Robust standard errors in parentheses.2.***, **, and * indicate statistical significance at the 1%, 5%, and 10% levels.3.The purpose at the start of the program (multiple responses) and major changes in work and life during the program period are used as control variables but are omitted from the table.https://doi.org/10.1371/journal.pone.0287051.t009